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Abstract. Based on literature analysis and own bioinformatics and virology research experience, authors propose 
multistep data processing algorithms, designed for the objectives of assisting the SARS-CoV-2 epitope vaccine 
production. 

Epitope vaccines are expected to provoke a weaker but safer response of the vaccinated person. Methodologies 
of reverse bioengineering, vaccinology and synthetic peptide manufacturing have a promising future to combat 
COVID-19 brutal disease. 

The significant mutational variability and evolution of the SARS-CoV-2, which is more typical for natural 
animal-borne viruses, are the hurdle for the effective and robust vaccine application and therefore require 
multidisciplinary research and prevention measures on the international level of cooperation. 

However, we can expect that other viruses with different nature and content may be labelled as SARS-CoV-2. In 
this case metagenomics is an important discipline for COVID-19 discovery. 

High quality reliable virus detection is still an unresolved question for improvement and optimization. 

It is of upmost importance to develop the in silico and in vitro methods for the vaccine recipient reaction 
prediction and monitoring as techniques of the so-called modern personalized medicine. 

Many questions can‘t be solved applying exclusively in silico techniques and only can be discovered in vitro and 
in vivo, demanding significant time and money investments. 

Future experiments also should be directed at the discovery of optimal vaccine adjuvants, vectors and epitope 
ensembles, as well as the personal characteristics of citizens of a certain region. This research would require several 
more years of meticulous large-scale laboratory and clinical work in various centers of biomedical institutions 
worldwide. 
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PoOota BuKOHaHa 3a MiTpuMKU bisopycbKoro pecityOuikaHcbKoro PoHAy 
(PyHWaMeHTaJIBHHX JOCIIVKeHB Y paMKax IIPOEKTY: 
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BeKTOPHOi I1aT*POPMU KMMIKOBOrFO afeHoBipycy», Ne PP 20210889 Big 26.04.2021 


Anotauia. Ha ocHosBi anali3y JiTepaTypHux fpKepel Ta BiacHoro JOCBIZy OCIWKeHb y Tasly3i 
OioiHPopMaTHKU Ta Bipycouorii, aBTOpH MpOnOHy!oTb OaraToeTalHi asIrOpHTMH OOpoOKU WaHUX, WO po3pobseHi 3 
MeTOIO CIIPHAHHA BAPOOHUUTBY eHITOMHO! BaKIWHH mpoTA SARS-CoV-2. 

OuikyeTbea, WO eMiTOMHi BaKUMHH OyyyTb Oe3neuHIWIMMH HM BUKIHKAaTMMyTb OiWIbI ciaOKy peaKLir0 
opraHi3my. Mertoyonoril 3BopoTHOi Oi0iHxKeHepii, BaKIMHONOrii Ta BUpOOHMIITBA CHHTeTHYHUX TIeCMTHAIB MarOTb 
BesIMKe MaliOyTHE y OopoTbOi 3 THKKHM 3axBOpIoBaHHAM COVID-19. 

3HauHa TeHeTH4HA MIHJIMBICTb Ta eBomtoUia Bipycy SARS-CoV-2, wo npHTaMaHHa NpuposzHHM BipycaM, AKI 
HOXOJATb Bil TBAPHH, € MepewiKOWOIO WIA ec:PeKTHBHOTO Ta HafiMHOro 3aCTOCyBaHHA BaKUMHU, a TOMY BHMarae 
MYJIBTHUCUMMNHapHUx JOCIPKeHb Ta WpodiakTHYHX 3aXOJ1B Ha MDKHapOJHOMY PiBHi CHIBIpali. 

Oynak ini Bipycu 3 BiAMIHHOIO MpHpoyoro Ta OyqOBOIO MOXKYyTb OyTH Mo3sHayeHi 1K SARS-CoV-2. Orxe, 
MeTareHOMIKa € BaXKIIMBOIO JMCIMIMIMHOLO Id BHABIeHHA COVID-19. 

IlutaHHa sAKicHOro Ta HajMHOTO BHABJICHHA BIPYCiB 3aJIMMa€TbCA BIQKPHTHM Id BAOCKOHAIeCHHA Ta 


ONTHMi3allii. 


Hay3Bu4aiiHo BaxkKIMBOIO € pospoOKa MeTOUIB in silico Ta in vitro Ait NporHosyBaHHA Ta MOHITOpHHTy peakuli 
pelMMicHTa BaKUMHH, 1K MCTOAMK Tak 3BaHOI CyyacHoi HepcoHasi30BaHol MeAMUMHH. 

3acTocyBaHHA BUKJIOYHO MeTOAy in silico HeqOcTaTHbO AIA BUpilweHHaA OaraTbox mutans. Li npoOsmemu 
HOTpeOy10Tb 3aCTOCyBaHHA MeTOIB in Vitro Ta in Vivo, LJO BUMaraloTb 3HaYHHMX 3aTpaT 4acy Ta KOUITIB. 
MaiiOyTHi eKcIlepHMeHTH TaKOx MaloTb OyTH CIIpAMOBaHi Ha BHABJICHHA ONTHMAJIbHUX a’ }OBAaHTIB, BEKTOPIB Ta 
MO€THAHb eMITOMIB, a TAKOXK Ha iHAMBILyasIbHi OCOOJMBOCTI MeIIKaHIMB MeBHOrO periony. Ja boro WoOcIiWKeHHA 
3HAaOOMTECA We KiJIbKa POKIB PeTeIbHO! HM MaclITAaOHOi JaOOpaTOpHOi Ta KIMHI4HO! poOoTH B pi3HHX WeHTpax 


OioMeAM4HUX YCTaHOB 110 BCbOMYy CBITy 


Karo4yosi copa: SARS-CoV-2, COVID-19, emironHa BakiwHa, Mequ4Ha KidepHeTuKa, OioindopmaTuka, 


reHOMika, ajsITOPpHTMH. 


Introduction, problem and _ objectives 

review 

The development of algorithms for 
processing data from genomes of especially 
dangerous viruses has already been partially 
covered in scientific publications by authors 
from the largest research centers of virology 
and tropical infections. 

Several sources present algorithms for 
the analysis of genomic texts of the Zika 
virus, West Nile fever, Chikungunya, Nipah, 
Ebola, African swine fever, MERS (Middle 
East respiratory syndrome), SARS (severe 
acute respiratory syndrome), preceding the 
current COVID-19 pandemic (Coronavirus 
disease 2019). 

Over the past six months, a large 
number of articles and chapters of 
monographs on the topic of algorithms for 


processing genomic data for the goals of 
development of a rational design of anti 
SARS-CoV-2 peptide vaccine have been 
appeared in print [1-13]. 

SARS-CoV-2 is a modern plague, 
which has been killing people worldwide 
since 2019. The estimated number of related 
deaths is about 5 000 000. 

The exact origin of the virus is 
still unknown, several hypotheses propose 
deforestation and zoonoses together with 
recombinant and artificial virus leakages as a 
possible cause of the pandemics. 

Thus, the development of effective and 
safe vaccines, chemoprohylaxis and therapy is 
of upmost importance today. 


ISSN 2710 — 1673. Artificial Intelligence. 2021. Ne 2 


-H25 2-2 SS SS 


Methodology 

Reverse vaccinology allows to use 
genome texts in order to define and select the 
essential genome regions, applying the 
modern bioinformatics and genomics 
software and algorithms, for the final goals of 
synthetic peptide bioengineering, vaccine 
design production, testing and 
practical application with an 
expected protective effect and benefits for 
the consumers. 

Viral genome texts are produced via the 
signal processing of the sequencing machines 
(lumina, Jon Torrent, Oxford Nanopore 
Minlon, Pacific Biosciences, etc.) and the 
resultant raw data are subjected to the 
multistep bioinformatics data processing 
pipeline. 

The final result of the data processing 
are the genome contigs with an annotation, 
assigning the biologically and 
pathogenetically relevant terms and 
definitions to the input chunks of nucleotide 
or the amino acid words. 

Therefore, we have to understand that 
the genomics data processing way is an error 
prone activity, which is highly dependent on 
the input data quality, proper de novo 
assembly algorithms and their options and 
settings and the relevant contig annotations. 


Results and discussion 

Based on the 13 years of experience of 
biology and_ relevant bioinformatics 
computations and the extensive modern 
literature on the topics of the reverse 
vaccinology, next generation sequencing data 
processing, we have developed data 
processing algorithms for the objectives of 
the SARS-CoV-2 epitope compound design 
(Figures 1-4). 

The proposed algorithms may _ be 
implemented applying various software and it 
is realistic to expect the variations of the 
resultant epitope amino acid sequences 
therefore. 

Figure 4 shows how the genome contigs 
can be annotated to disclose their 
physiological and biological functions. 

When the elements of genome texts, 
representing the highly immunogenic and 
simultaneously non-allergenic to the host 


epitopes are defined, they are further 

processed incorporating solid-state peptide 

synthesis for producing vaccine components. 

However, the generated synthetic 
proteins are not the complete vaccine 
preparation and is just an artificial analogue 
of the actual virus. 

Further experiments are required and 
expectantly will show and substantiate the 
following questions, regarding the successful 
compositions of the multiepitope vaccine: 

1. How the synthetic peptides affect the 
immune response of the mammals. 
Including the human host organism, how it 
compares versus the natural response of 
the actual SARS-CoV-2 virus and the other 
vaccines? 

2. How the application of the few-epitope and 
multiepitope vaccines compares and may 
the epitope overload provoke the excessive 
immune reactions, such as the previously 
described cytokine storm, anaphylaxis and 
degenerating autoimmune reactions? 

3. How the vaccine adjuvants and vectors 
change the immune response and _ the 
disease prevention efficacy in vitro and in 
vivo? 

These are only a few questions to be 
replied urgently. 

The quality of the initial data is of 
decisive importance for in __ silico 
bioinformatics analysis of genomes. Errors in 
reference databases can lead to incorrect 
amino acid substitutions, which will result in 
the synthesis of an ineffective immunogene 
that is unable to bind antigens and produce a 
sufficiently strong antiviral immune response. 


Conclusion 

On the basis of scientific literature 
data and the experience of computational 
experiments, algorithms for processing 
coronavirus genomes have been developed for 
the purposes and tasks of modern 
immunoinformatics, vaccinomics and 
virology. The algorithms can be applied and 
adapted to develop epitope vaccines against 
highly dangerous viruses of various origins. 
To implement the developed algorithms, 
various software and its elements, ensembles 
and complexes can be used. Viral vaccine 
development is a time and money highly 
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demanding research and manufacturing 
endeavor and requires serious quality control 
measures and the unexpected event and 
complications registration and reporting. The 
benefits of vaccination is not guaranteed 
for the whole population of consumers. 


Figure 1. Venn diagram showing interdisciplinary 
nature of the SARS-CoV-2 epitope vaccine 
development 


The major role of scientific efforts is to 
assist the development of safe vaccines to 
reduce the number of vaccine-related deaths 
and severe health harms. 


Selection, downloading and analysis of the content and quality of 
genomic texts of the complete genome or the individual genes 
@ 


De Novo genome assembly of FASTQ files of metagenomes or the 
original genomes or transcriptomes of viruses 


Annotatiion of selected genomic texts 
7 
Single and multiple alignment of genomes with each other and with references, 


genotyping, identification of strains, construction of phylogeny and maps of similarity 
and difference (4) 
v 


Analysis of gene variability, identification of the new gene 
functions, comparison of features 
© 
v 


Formation of the input data for generating sequences of epitopes 
© 


Computation and statistical selection of linear and 
conformational B-, interferon and T-cell epitopes 
@ 
v 
Structural, functional, pathogenetic, allergological, toxicological 
analysis and selection of the filtered genomic epitope texts 


In silico combining of epitopes and linkers, adjuvants, modeling 
the interaction of a vaccine and the host receptors 


4 


Solid-phase synthesis of vaccine compounds, manufacturing of the vaccine preparation, 


laboratory and clinical trials, implementation in healthcare practice 


(0) 


Figure 2. Technical presentation of the principal 
algorithm for the development of an epitope vaccine 
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Figure 3. Biomedical presentation of the general 
algorithm for the development of an epitope vaccine 
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Figure 4. Genome annotation classification 
and software 


M.V. Sprindzuk, A.S. Vladyko, L.P. Titov 
Data processing algorithms for the in silico 
SARS-CoV-2 epitope prediction and 
vaccine development 

Based on literature analysis and own 
bioinformatics and virology _ research 
experience, authors propose multistep data 
processing algorithms, designed for the 
objectives of assisting the SARS-CoV-2 
epitope vaccine production. Many questions 
cant be solved applying exclusively in silico 
techniques and only can be discovered in vitro 
and in vivo, demanding significant time and 
money investments. 
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List of terms and _ definitions and 


terms (with additions and changes from) 
[14]: 


antigenicity — the property to induce an 
immune response; 

antigens — inducers of the immune 
response, modifiers of the body's 
immunological reactivity; 

genome — a complete set of genes that 
determine all the properties of an organism; 
genetic recombination -— the process of 
formation of genomes containing genetic 
material from two or more parental forms; 
genotype — a set of active and 
inactivated genes that are the principal part 
of the factors of heredity of the organism; 
genetic engineering — a branch of 
genetics that develops techniques for 
manipulating nucleic acids and uses these 
methods for genetic research and obtaining 
organisms with mixed genomes; 

gene pool — a set of genotypes of 
individuals representing a population, a 
type of microorganism and other organic 
forms; 

genophore — acarrier of genes; 
identification of viruses — laboratory and 
bioinformatics process of determining the 
systematic position of an unknown virus 
strain down to the type or variant; 

isolates — cultures of viruses or other 
microbes isolated from a specific source; 


immunization — a way to artificially 
create immunity; 
immunity — a complex of protective and 


adaptive reactions and adaptations aimed at 
maintaining the constancy of the antigenic 
composition of the internal environment of 
the macroorganism by killing and other 
types of neutralization and removal of 
foreign objects of antigenic nature; 
humoral immunity -— immunity, the 
major effectors of which is antibodies; 
cellular immunity — immunity, the 
principal effector of which are sensitized 


lymphocytes and the lymphokines 
produced by them; 
acquired immunity = a form of 


immunity that is acquired in the process of 
individual development of the organism as 
a result of contact with parasites and 
substances of antigenic nature; 
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interferons — a class of inductive low 
molecular weight alpha-helical proteins of 
vertebrates, possessing antiviral and other 
activities within the species to which the 
producer of interferons belongs; 

infection — a set of pathological, 
adaptive and reparative reactions of an 
organism resulting from its competitive 
interaction with microbes; 

pathogenicity of viruses -— _ the species 
potential ability of viruses to cause an 
infectious process in their hosts; 
reinfection — recurrent infection of an 
organism that has suffered a disease with 
the same or another variant of a certain 
type of pathogen; 

superinfection -— infection of a patient 
with the same or another variant of the 
same pathogen during the course of the 
disease; 

transcription — the process of 
transferring genetic information from the 
genome to informational RNA; 

translation — the process of formation of 
a polypeptide chain on mRNA associated 
with polyribosomes; 

transfection — _ infection of cells by 
introducing genomic and subgenomic 
molecules of nucleid acids. 
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